Building the AI layer inside an enterprise product

There is an obvious version of "add AI to your product" that we did not build. It involves a chat icon in the corner, a side drawer, a prompt box, and a model behind a thin API. It is what most enterprise software shipped in 2024 and what most users learned to ignore. The version we built at Klay Securities is different in almost every dimension. This is the field note on what we learned.

The brief: a securities firm with thirty relationship managers, each running a book of high-net-worth clients in an environment regulated by SEBI and partly by FINMA on the cross-border side. They wanted AI inside the product. They did not want a chatbot. They wanted the existing dashboard to know things, decide some, and surface the rest.

The first decision: inside, not beside

The most consequential choice was the simplest. The AI layer would live inside the existing product, sharing its data model and its surfaces, not as a separate experience users had to switch into. There would be no "AI mode." There would be no chat drawer. The product would simply do more, in the same places, with the same affordances.

This sounds obvious. It is not. Adding AI as a side panel is much easier to ship, much easier to scope, and much easier to demo. It is also much easier for users to ignore. The studios shipping side panels are getting low adoption metrics and explaining them away with "early days." The studios shipping inside are finding the AI uses itself.

Concretely, the choice meant the AI features had to be designed by the same people designing the dashboard, written in the same codebase, deployed on the same cadence, and bound by the same access controls. The infrastructure team had to extend, not duplicate. The design team had to absorb a new vocabulary, not invent a new product.

Three classes of AI feature

Inside the product, three kinds of AI work earn their keep. We learned to distinguish them carefully because they have different latency budgets, different reliability requirements, and different consequences for getting them wrong.

Class 1 — Reads

Operations that observe and summarise. "What changed in this client's portfolio since I last looked." "What are the three things from today's news that affect my book." These are fast, cheap, and recoverable — if the summary is wrong, the user notices immediately and re-reads the underlying data. Latency budget: under 2 seconds. Failure mode: stale data or hallucinated number. Mitigation: every claim links to the source row.

Class 2 — Drafts

Operations that produce a thing a human will edit. "Draft a Q2 review for this client." "Generate the call brief for this 10:30 meeting." These are slower, more expensive, and carry more weight because the user is more likely to ship them as-is. Latency budget: under 30 seconds in the background, with progressive UI. Failure mode: tone wrong, fact wrong, structure wrong. Mitigation: visible edit trail, side-by-side with the source, mandatory human approval before any external send.

Class 3 — Acts

Operations that change state. "File this compliance flag." "Rebalance this position by 50 bps." These are the ones the regulator and the lawyer want to discuss. Latency does not matter — what matters is that the action is reversible, attributable, and audited. Failure mode: a bad action that the user did not consciously authorise. Mitigation: every act requires explicit human approval (no auto-execute), and every approved act writes a full reasoning log to a separate audit store the compliance team can read.

The mistake we saw other teams make: treating Class 3 with the same UI patterns as Class 1. A summary that's wrong is recoverable. An action that's wrong is a compliance event.

The architecture, briefly

The AI Intelligence Layer at Klay has four parts. We will not name the model vendors — that conversation is a separate one — but the shape is the part that matters.

1. Context layer. A typed schema describing every object the AI is allowed to read: clients, portfolios, positions, news items, internal notes, regulatory bulletins. Every call to the model goes through this layer, which selects the minimum-necessary context, redacts PII according to client-level rules, and stamps a context hash on the prompt.

2. Reasoning layer. The model itself (with a smaller "router" model in front to triage which questions can be answered without the bigger one). Outputs are always structured JSON, never freeform prose where structure is feasible. Freeform comes in only at the very end, when the draft text is what the user will see.

3. Verification layer. A second pass — sometimes a different model, sometimes a deterministic rules engine, sometimes both — that checks the output for hallucinated numbers, policy violations, and tone problems. Outputs that fail verification are quietly dropped and re-tried, or marked low-confidence in the UI.

4. Audit store. A separate, write-only event store that records every prompt, every retrieval, every output, every human decision. Read-access scoped to the compliance team. Never used by the live product. Used heavily by the auditor.

The four layers are decoupled. The reasoning model can be swapped without changing the verification layer. The verification rules can be tightened without redeploying the context schema. This decoupling is what makes the system maintainable against a moving model landscape.

A summary that's wrong is recoverable. An action that's wrong is a compliance event.

The regulatory edge

Working in regulated finance changes the AI conversation in three ways most engineering teams underestimate.

First — the model itself cannot be a regulator-approvable decision-maker. SEBI does not approve LLMs; it approves processes. Which means every AI output has to live inside a human-approved decision process, with the human's name on it. The architecture has to make the human approval easy, fast, and reliably traceable.

Second — training data is a question. Most enterprise plans now have no-training clauses. We confirmed in writing with every vendor. We also run sensitive operations against a model deployed in our own infrastructure, where the prompts never leave the firm's environment.

Third — the cross-border line matters. Klay's UK clients are covered by different rules than its India clients. The context layer enforces this: a model handling a UK client never receives an India-only data point, and vice versa. The enforcement is at the data layer, not the prompt layer, because prompt-level enforcement is wishful thinking.

Where the line between agent and analyst sits

The hardest design question on the project. Where does the agent stop and the analyst start?

Our answer: the agent stops at the decision. Everything up to the decision is fair game. Reading positions, summarising portfolios, flagging anomalies, drafting reviews, suggesting actions, monitoring 24/7 — all the agent's work. Everything from the decision forward — approving an action, sending a message externally, executing a trade, filing a regulatory document — is the analyst's work.

This sounds like a sharp line. In practice it is not — there are dozens of decisions per day, each with their own gravity, and the UI has to make the line feel natural. The convention we settled on: any output the agent produces gets a green "Approve" / amber "Edit" / red "Discard" affordance. The colour is the gravity. The hand on the button is always the analyst's.

Things we got wrong

Two we will not repeat.

We overinvested in the chat affordance early. The first version had a chat panel because that was the obvious thing. Within a month it was clear the panel got 5% of the engagement of the inline features. We removed it. The product got better.

We trusted streaming for too long. The model streams output token-by-token, which feels responsive but produces a UI that flickers between half-complete states. For Class 2 work where the user reads the output, we now show a "drafting…" state and only reveal the final text when it is verified. Slightly slower-feeling, much more trustworthy.

The outcome, in operator's language

Six months in: the average RM at Klay is running 3.4× the book they were running a year before, with no measurable degradation in client satisfaction scores. Compliance flags raised by the system are now the source of 28% of all flags the human team eventually escalates. The audit team has not raised a single concern about the AI's reasoning trail — partly because the architecture made the trail readable to them from day one.

Two metrics we are still watching closely: hallucination rate (currently 0.7% by our internal audit), and analyst trust (high among power users, lower among new joiners — we have work to do on onboarding).

Closing

The teams shipping AI inside enterprise products are doing the same work in 2026 that the teams shipping mobile-first products were doing in 2010: figuring out a new architecture, a new UI language, and a new regulatory shape, while everyone else builds clones of the wrong thing. The studios that get the architecture right early will be the ones whose products survive the next platform shift. The studios that ship chatbots will be discovering, slowly, that nobody wanted them.

The AI layer is not a feature. It is a layer. Treat it that way.

Building the AI layer inside an enterprise product.

The first decision: inside, not beside

Three classes of AI feature

Class 1 — Reads

Class 2 — Drafts

Class 3 — Acts

The architecture, briefly

The regulatory edge

Where the line between agent and analyst sits

Things we got wrong

The outcome, in operator's language

Closing

Why design and engineering shouldn't be two teams.

Framer at production scale, seriously.

Some of our best projects
started with a two-line email.

The first decision: inside, not beside

Three classes of AI feature

Class 1 — Reads

Class 2 — Drafts

Class 3 — Acts

The architecture, briefly

The regulatory edge

Where the line between agent and analyst sits

Things we got wrong

The outcome, in operator's language

Closing

Why design and engineering shouldn't be two teams.

Framer at production scale, seriously.

Some of our best projectsstarted with a two-line email.

AI as a mechanism, not a feature.

The cost curve most pilots ignore.

AI inside UX/UI.

Some of our best projects
started with a two-line email.